perf: slight optimization on merge states #313

yzh119 · 2024-06-19T02:40:33Z

When cudagraph is enabled, we will still call merge states kernels for short sequence length, which incurs some unnecessary overhead.

This PR accelerates merge states kernel when there is nothing to merge (num_index_sets=1).

We can actually write through to the target buffer for small sequence length, but I'm always lazy evaluated and I'll leave it for a future PR (if necessary).

zhyncs · 2024-06-19T02:52:55Z

The commit msg is interesting :P

yzh119 · 2024-06-20T08:58:20Z

never mind :)

Yard1 · 2024-06-28T20:02:22Z

include/flashinfer/attention/cascade.cuh


-  vec_t<float, vec_size> v_merged_vec;
-  v_merged_vec.fill(0.f);
+  if (num_index_sets > 1) {


wouldn't it be cleaner to just early return instead of indenting everything?

yes you are right, thanks for your suggestion.

ok how come upd

🤖 I have created a release *beep* *boop* --- ## [0.1.2](v0.1.1...v0.1.2) (2024-07-29) ### Bugfix * Fix the sampling kernel bug for cu118 ([#386](#386), [#387](#387)) ([0cd499](0cd4994), [dc3f18](dc3f184)) ### Features * add llama 3.1 style rope ([#401](#401)) ([4c89dec](4c89dec)) * non-inplace rope operators ([#405](#405)) ([74ffba1](74ffba1)) * sliding window attention ([#406](#406)) ([28cffd3](28cffd3)) * support non-contiguous (packed) input for prefill kernels ([#404](#404)) ([68c3719](68c3719)) ### Performance Improvements * slight optimization on merge states ([#313](#313)) ([701c813](701c813)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Zihao Ye <expye@outlook.com>

Yard1 reviewed Jun 28, 2024

View reviewed changes

yzh119 force-pushed the stupid-optimization-on-merge-states branch from 90e5473 to cf7a7d4 Compare July 24, 2024 06:57

upd

a8bc999

ok how come upd

yzh119 force-pushed the stupid-optimization-on-merge-states branch from cf7a7d4 to a8bc999 Compare July 24, 2024 06:58

yzh119 merged commit 701c813 into main Jul 24, 2024

github-actions bot mentioned this pull request Jul 24, 2024

chore(main): release 0.1.2 #394

Merged

yzh119 deleted the stupid-optimization-on-merge-states branch July 24, 2024 10:38

github-actions bot mentioned this pull request Jul 31, 2024

chore(main): release 0.1.4 #415

Merged

github-actions bot mentioned this pull request Dec 25, 2024

chore(main): release 0.3.0 #698

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: slight optimization on merge states #313

perf: slight optimization on merge states #313

yzh119 commented Jun 19, 2024 •

edited

Loading

zhyncs commented Jun 19, 2024

yzh119 commented Jun 20, 2024

Yard1 Jun 28, 2024

yzh119 Jul 3, 2024

perf: slight optimization on merge states #313

perf: slight optimization on merge states #313

Conversation

yzh119 commented Jun 19, 2024 • edited Loading

zhyncs commented Jun 19, 2024

yzh119 commented Jun 20, 2024

Yard1 Jun 28, 2024

Choose a reason for hiding this comment

yzh119 Jul 3, 2024

Choose a reason for hiding this comment

yzh119 commented Jun 19, 2024 •

edited

Loading